Attention over Attention

Implementation of the paper Attention-over-Attention Neural Networks for Reading Comprehension in tensorflow

Some context on my blog

Reading comprehension for cloze style tasks is to remove word from an article summary, then read the article and try to infer the missing word. This example works on the CNN news dataset.

With the same hyperparameters as reported in the paper, this implementation got an accuracy of 74.3% on both the validation and test set, compared with 73.1% and 74.4% reported by the author.

To train a new model: python model.py --training=True --name=my_model

To test accuracy: python model.py --training=False --name=my_model --epochs=1 --dropout_keep_prob=1

Note that the tfrecords and model files are stored with git lfs

Raw data for use with reader.py to produce .tfrecords files was downloaded from [http://cs.nyu.edu/~kcho/DMQA/]

Interesting parts

Masked softmax implementation
Example of batched sparse tensors with correct mask handling
Example of pointer style attention
Test/validation split part of the tf-graph

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
models/my_model		models/my_model
README.md		README.md
model.py		model.py
reader.py		reader.py
test.tfrecords		test.tfrecords
training.tfrecords		training.tfrecords
util.py		util.py
validation.tfrecords		validation.tfrecords

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

models/my_model

models/my_model

README.md

README.md

model.py

model.py

reader.py

reader.py

test.tfrecords

test.tfrecords

training.tfrecords

training.tfrecords

util.py

util.py

validation.tfrecords

validation.tfrecords

Repository files navigation

Attention over Attention

About

Releases

Packages

Languages

OlavHN/attention-over-attention

Folders and files

Latest commit

History

Repository files navigation

Attention over Attention

About

Resources

Stars

Watchers

Forks

Languages